skip to main content


Search for: All records

Creators/Authors contains: "Baroud, Hiba"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available March 1, 2025
  2. Abstract

    As researchers collect large amounts of data in the social sciences through household surveys, challenges may arise in how best to analyze such datasets, especially where motivating theories are unclear or conflicting. New analytical methods may be necessary to extract information from these datasets. Machine learning techniques are promising methods for identifying patterns in large datasets, but have not yet been widely used to identify important variables in social surveys with many questions. To demonstrate the potential of machine learning to analyze large social datasets, we apply machine learning techniques to the study of migration in Bangladesh. The complexity of migration decisions makes them suitable for analysis with machine learning techniques, which enable pattern identification in large datasets with many covariates. In this paper, we apply random forest methods to analyzing a large survey which captures approximately 2000 variables from approximately 1700 households in southwestern Bangladesh. Our analysis ranked the covariates in the dataset in terms of their predictive power for migration decisions. The results identified the most important covariates, but there exists a tradeoff between predictive ability and interpretability. To address this tradeoff, random forests and other machine learning algorithms may be especially useful in combination with more traditional regression methods. To develop insights into how the important variables identified by the random forest algorithm impact migration, we performed a survival analysis of household time to first migration. With this combined analysis, we found that variables related to wealth and household composition are important predictors of migration. Such multi-methods approaches may help to shed light on factors contributing to migration and non-migration.

     
    more » « less
  3. null (Ed.)
  4. A Bayesian parameter estimation methodology for updating the distributions of the duration-based variables used in post-earthquake building recovery modeling is presented. The distributions of the recovery-related parameters specified in the resilience-based earthquake design initiative (REDi) and HAZUS are used as the basis of the priors. A data set of observed building damage and recovery following the 2014 South Napa earthquake is assembled and used to illustrate the proposed methodology. The recovery data set includes the permit acquisition and repair time for over 800 buildings affected by the earthquake. With this data, the conjugate prior (CP) and Markov Chain Monte Carlo (MCMC) methods are implemented to update the probability distribution parameters for the duration-based recovery variables. While the CP approach is easier to implement because it offers an analytical solution, the MCMC provides more flexibility in terms of the types of prior and sampling distributions that can be accommodated. Moreover, the results from a comparative implementation on the Napa data set shows that the MCMC method provides a reasonable approximation of the posterior marginal distribution of the duration-based recovery variables relative to the CP analytical solution.

     
    more » « less
  5. null (Ed.)
  6. null (Ed.)
    Principled decision making in emergency response management necessitates the use of statistical models that predict the spatial-temporal likelihood of incident occurrence. These statistical models are then used for proactive stationing which allocates first responders across the spatial area in order to reduce overall response time. Traditional methods that simply aggregate past incidents over space and time fail to make useful short-term predictions when the spatial region is large and focused on fine-grained spatial entities like interstate highway networks. This is partially due to the sparsity of incidents with respect to the area in consideration. Further, accidents are affected by several covariates, and collecting, cleaning, and managing multiple streams of data from various sources is challenging for large spatial areas. In this paper, we highlight how this problem is being solved for the state of Tennessee, a state in the USA with a total area of over 100,000 sq. km. Our pipeline, based on a combination of synthetic resampling, non-spatial clustering, and learning from data can efficiently forecast the spatial and temporal dynamics of accident occurrence, even under sparse conditions. In the paper, we describe our pipeline that uses data related to roadway geometry, weather, historical accidents, and real-time traffic congestion to aid accident forecasting. To understand how our forecasting model can affect allocation and dispatch, we improve upon a classical resource allocation approach. Experimental results show that our approach can significantly reduce response times in the field in comparison with current approaches followed by first responders. 
    more » « less
  7. null (Ed.)
    Abstract Modeling the resilience of interdependent critical infrastructure (ICI) requires a careful assessment of interdependencies as these systems are becoming increasingly interconnected. The interdependent connections across ICIs are often subject to uncertainty due to the lack of relevant data. Yet, this uncertainty has not been properly characterized. This paper develops an approach to model the resilience of ICIs founded in probabilistic graphical models. The uncertainty of interdependency links between ICIs is modeled using stochastic block models (SBMs). Specifically, the approach estimates the probability of links between individual systems considered as blocks in the SBM. The proposed model employs several attributes as predictors. Two recovery strategies based on static and dynamic component importance ranking are developed and compared. The proposed approach is illustrated with a case study of the interdependent water and power networks in Shelby County, TN. Results show that the probability of interdependency links varies depending on the predictors considered in the estimation. Accounting for the uncertainty in interdependency links allows for a dynamic recovery process. A recovery strategy based on dynamically updated component importance ranking accelerates recovery, thereby improving the resilience of ICIs. 
    more » « less